{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# COMPSCI 389: Homework 1\n",
    "\n",
    "**Assigned**: February 14, 2024. **Due**: February 22, 2024 at 2:00pm Eastern. **Note**: Submissions received after 2:00pm Eastern on February 29, 2024 will receive no credit.\n",
    "\n",
    "**Submitting**: Upload your submission on Gradescope as a `.pdf`. Converting to a PDF can be a complicated process, and so we encourage you to test this process well in advance of the submission deadlines. We recommend converting to HTML, opening the HTML file in a browser, and then printing or exporting to a PDF from your browser. We do not recommend directly converting to a PDF, since this requires installing xelatex. To convert to HTML in VSCode, press `ctrl+shift+p` and type `export`, and you should see an option to export to HTML.\n",
    "\n",
    "**Note**: Keep your `.ipynb` file, as we may request it directly (via email).\n",
    "\n",
    "**Note**: When converting to a PDF file, ensure that all of your code cells have been executed. The results of these executions *must* be included in your submitted PDF."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instructions\n",
    "\n",
    "Complete the questions below, replacing the <font color=\"blue\">blue</font> text with your own answers (your answers do not need to remain in blue). Do **not** modify the <font color=\"green\">green</font> text. Try to answer the questions without consulting your notes or any online material. If you cannot, then consult your notes, and if absolutely necessary, consult course materials (slides, notebooks) and/or Wikipedia. Do **not** use other sources or tools like ChatGPT. Complete this part of the assignment on your own (do **not** work with others).\n",
    "\n",
    "After you have completed all of the questions, at the bottom of this assignment you will find a link to another notebook, `Homework 1 Solutions.ipynb`. This contains the solutions, and instructions for ensuring that your answers are correct and sufficient. Make another pass through your homework assignment, replacing the <font color=\"green\">green</font> text with descriptions of what you missed for each question, and providing the fixes necessary to make your answer correct. **The solutions file may include additional instructions, which may include additional content to respond to even if you got a question correct (e.g., additional reflection).** During this second stage where you are filling in your answers, replacing the <font color=\"green\">green</font> text, you may reference the solutions, work with others, and use any tools (including ChatGPT).\n",
    "\n",
    "You will only submit this assignment once after replacing both the blue and green text. You do not need to submit the assignment between the first and second passes. Grading for each question will be based on whether you followed this process, and arrived at the correct answers and have sufficient discussion/text in the end. Points will be deducted if you did not make a reasonable effort to answer the question initially, if your final answer remains incorrect, of if your answers were not sufficiently clear (so, write in full sentences with proper punctuation, and conveying your arguments clearly). Other than verifying that you made a reasonable initial effort for your initial answers (<font color=\"blue\">blue</font>), points will **not** be deducted due to *initial* answers being incorrect. Hence, there is no reason to break the rules to obtain correct answers initially."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 1: Short Answer\n",
    "\n",
    "Answer the following questions with at least a few sentences, and no more than roughly one page of text."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1. [10 points] What is the definition of machine learning?\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2. [10 points] What is the difference between regression and classification?\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. [10 points] Give an example of an algorithm (hypothetical or real) that falls within the field of *artificial intelligence* (AI) but **not** the field of *machine learning* (ML). Explain why this algorithm is in an AI algorithm but not an ML algorithm.\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. [10 points] Give an example of a problem that is (or could be) solved using supervised learning. \n",
    "\n",
    "Select a problem that we have **not** discussed in class.\n",
    "\n",
    "Describe the problem, what the data would look like, whether data is available (or obtainable by some organization), whether it is a classification (binary or multi-class?) or regression problem (univariate or multivariate?). When describing the features that would likely be in the data set, describe whether they are numerical (discrete [binary, non-ninary], continuous), categorical (nominal, ordinal), or of some other type.\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. [10 points] Why should we avoid evaluating machine learning models using the same data that was used to train the model?\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 6. [10 points] Propose a new (not discussed in class) evaluation metric for regression.\n",
    "\n",
    "Give an equation for the new metric, and explain what it does. Does it have potential benefits for some problems? Does it have potential drawbacks?\n",
    "\n",
    "**Hint**: If you're stuck, look at the comment in the source of the markdown cell at the bottom of this document. You can do this during the initial answer phase.\n",
    "\n",
    "**Note**: If you are unsure how to format your answer using LaTeX to write it in this document, you may consult online references for LaTeX (or use ChatGPT to ask it how to write your equation), but limit this to determining how to display an equation you have already formulated.\n",
    "\n",
    "As a reminder, we have already discussed the following metrics:\n",
    "\n",
    "Means Squared Error: \n",
    "$$\\operatorname{MSE}=\\frac{1}{n}\\sum_{i=1}^n (y_i-\\hat y_i)^2,$$\n",
    "\n",
    "Root Mean Squared Error:\n",
    "$$\\operatorname{RMSE}=\\sqrt{\\frac{1}{n}\\sum_{i=1}^n (y_i-\\hat y_i)^2}.$$\n",
    "\n",
    "Mean Absolute Error:\n",
    "$$\\operatorname{MAE}=\\frac{1}{n}\\sum_{i=1}^n \\left \\vert y_i - \\hat y_i \\right \\vert.$$\n",
    "\n",
    "R-squared: \n",
    "$$R^2=1-\\frac{\\sum_{i=1}^n (y_i-\\hat y_i)^2}{\\sum_{i=1}^n (y_i - \\bar y)^2}.$$\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part 2: Programming\n",
    "\n",
    "Recall how we used the function `fetch_openml` from scikit-learn to load OpenML data sets. Below is the code to load the Adult data set, described on OpenML's page [here](https://www.openml.org/search?type=data&status=active&id=45068)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "c:\\Users\\pthomas\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\sklearn\\datasets\\_openml.py:1022: FutureWarning: The default value of `parser` will change from `'liac-arff'` to `'auto'` in 1.4. You can set `parser='auto'` to silence this warning. Therefore, an `ImportError` will be raised from 1.4 if the dataset is dense and pandas is not installed. Note that the pandas parser may return different data types. See the Notes Section in fetch_openml's API doc for details.\n",
      "  warn(\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>education</th>\n",
       "      <th>education-num</th>\n",
       "      <th>marital-status</th>\n",
       "      <th>occupation</th>\n",
       "      <th>relationship</th>\n",
       "      <th>race</th>\n",
       "      <th>sex</th>\n",
       "      <th>capital-gain</th>\n",
       "      <th>capital-loss</th>\n",
       "      <th>hours-per-week</th>\n",
       "      <th>native-country</th>\n",
       "      <th>income</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>25.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>226802.0</td>\n",
       "      <td>11th</td>\n",
       "      <td>7.0</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Machine-op-inspct</td>\n",
       "      <td>Own-child</td>\n",
       "      <td>Black</td>\n",
       "      <td>Male</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>38.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>89814.0</td>\n",
       "      <td>HS-grad</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Farming-fishing</td>\n",
       "      <td>Husband</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>28.0</td>\n",
       "      <td>Local-gov</td>\n",
       "      <td>336951.0</td>\n",
       "      <td>Assoc-acdm</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Protective-serv</td>\n",
       "      <td>Husband</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&gt;50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>44.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>160323.0</td>\n",
       "      <td>Some-college</td>\n",
       "      <td>10.0</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Machine-op-inspct</td>\n",
       "      <td>Husband</td>\n",
       "      <td>Black</td>\n",
       "      <td>Male</td>\n",
       "      <td>7688.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&gt;50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>103497.0</td>\n",
       "      <td>Some-college</td>\n",
       "      <td>10.0</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Own-child</td>\n",
       "      <td>White</td>\n",
       "      <td>Female</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>30.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48837</th>\n",
       "      <td>27.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>257302.0</td>\n",
       "      <td>Assoc-acdm</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Tech-support</td>\n",
       "      <td>Wife</td>\n",
       "      <td>White</td>\n",
       "      <td>Female</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48838</th>\n",
       "      <td>40.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>154374.0</td>\n",
       "      <td>HS-grad</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Machine-op-inspct</td>\n",
       "      <td>Husband</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&gt;50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48839</th>\n",
       "      <td>58.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>151910.0</td>\n",
       "      <td>HS-grad</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Widowed</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Unmarried</td>\n",
       "      <td>White</td>\n",
       "      <td>Female</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48840</th>\n",
       "      <td>22.0</td>\n",
       "      <td>Private</td>\n",
       "      <td>201490.0</td>\n",
       "      <td>HS-grad</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Never-married</td>\n",
       "      <td>Adm-clerical</td>\n",
       "      <td>Own-child</td>\n",
       "      <td>White</td>\n",
       "      <td>Male</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>20.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&lt;=50K</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48841</th>\n",
       "      <td>52.0</td>\n",
       "      <td>Self-emp-inc</td>\n",
       "      <td>287927.0</td>\n",
       "      <td>HS-grad</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Married-civ-spouse</td>\n",
       "      <td>Exec-managerial</td>\n",
       "      <td>Wife</td>\n",
       "      <td>White</td>\n",
       "      <td>Female</td>\n",
       "      <td>15024.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>40.0</td>\n",
       "      <td>United-States</td>\n",
       "      <td>&gt;50K</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>48842 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        age     workclass    fnlwgt     education  education-num  \\\n",
       "0      25.0       Private  226802.0          11th            7.0   \n",
       "1      38.0       Private   89814.0       HS-grad            9.0   \n",
       "2      28.0     Local-gov  336951.0    Assoc-acdm           12.0   \n",
       "3      44.0       Private  160323.0  Some-college           10.0   \n",
       "4      18.0           NaN  103497.0  Some-college           10.0   \n",
       "...     ...           ...       ...           ...            ...   \n",
       "48837  27.0       Private  257302.0    Assoc-acdm           12.0   \n",
       "48838  40.0       Private  154374.0       HS-grad            9.0   \n",
       "48839  58.0       Private  151910.0       HS-grad            9.0   \n",
       "48840  22.0       Private  201490.0       HS-grad            9.0   \n",
       "48841  52.0  Self-emp-inc  287927.0       HS-grad            9.0   \n",
       "\n",
       "           marital-status         occupation relationship   race     sex  \\\n",
       "0           Never-married  Machine-op-inspct    Own-child  Black    Male   \n",
       "1      Married-civ-spouse    Farming-fishing      Husband  White    Male   \n",
       "2      Married-civ-spouse    Protective-serv      Husband  White    Male   \n",
       "3      Married-civ-spouse  Machine-op-inspct      Husband  Black    Male   \n",
       "4           Never-married                NaN    Own-child  White  Female   \n",
       "...                   ...                ...          ...    ...     ...   \n",
       "48837  Married-civ-spouse       Tech-support         Wife  White  Female   \n",
       "48838  Married-civ-spouse  Machine-op-inspct      Husband  White    Male   \n",
       "48839             Widowed       Adm-clerical    Unmarried  White  Female   \n",
       "48840       Never-married       Adm-clerical    Own-child  White    Male   \n",
       "48841  Married-civ-spouse    Exec-managerial         Wife  White  Female   \n",
       "\n",
       "       capital-gain  capital-loss  hours-per-week native-country income  \n",
       "0               0.0           0.0            40.0  United-States  <=50K  \n",
       "1               0.0           0.0            50.0  United-States  <=50K  \n",
       "2               0.0           0.0            40.0  United-States   >50K  \n",
       "3            7688.0           0.0            40.0  United-States   >50K  \n",
       "4               0.0           0.0            30.0  United-States  <=50K  \n",
       "...             ...           ...             ...            ...    ...  \n",
       "48837           0.0           0.0            38.0  United-States  <=50K  \n",
       "48838           0.0           0.0            40.0  United-States   >50K  \n",
       "48839           0.0           0.0            40.0  United-States  <=50K  \n",
       "48840           0.0           0.0            20.0  United-States  <=50K  \n",
       "48841       15024.0           0.0            40.0  United-States   >50K  \n",
       "\n",
       "[48842 rows x 15 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from sklearn.datasets import fetch_openml                                       # OpenML contains many useful data sets, including Adult.\n",
    "\n",
    "adult = fetch_openml(name='adult', version=2, as_frame=True)                    # Fetch the Adult dataset. as_frame=True indicates that we're using pandas and makes adult.data a DataFrame (among other changes)\n",
    "df = adult.data                                                                 # Pull off the DataFrame, as we will manipulate it to append the labels\n",
    "df['income'] = adult.target                                                     # Add the target column (what we aim to predict)\n",
    "display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Browse through the OpenML data sets [here](https://www.openml.org/search?type=data&status=active), and select one that could be used for a regression or classification problem, and which we did not discuss in lecture. Then, answer the following questions (you should read these before selecting a data set):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 1. [10 points] Describe the data set and provide a link to it. Describe the regression or classification problem that could be solved using this data set. What would the resulting regression or classification model be used for?\n",
    "\n",
    "This may already be described on OpenML. If so, you should state the answer in your own words.\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 2. [10 points] How many rows does the data set have? How many columns (with or without labels?)? What do the first five columns correspond to (mean)? What are the labels? What types are the first five columns (discrete, continuous, nominal, ordinal, continuous, discrete, binary, string, text, image, etc.)\n",
    "\n",
    "**Note**: The answers to some of these questions may be on the OpenML webpage, and may not require you to write any code. If you do write code, you do not need to include it in your submission.\n",
    "\n",
    "**Note**: It is acceptable to indicate that you are unsure if the data set does not accurately describe all of the relevant information. However, you should inspect the contents of the data set and try to make an educated guess.\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Replace this text with your answer.</font>\n",
    "\n",
    "---\n",
    "\n",
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document.</font>\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 3. [10 points] Provide code to load the data set, display the data frame, and the results of calling the Pandas `describe` function on the data frame.\n",
    "\n",
    "Note: When using `fetch_openml` you can use `version='active'` to always select the latest version of the data set.\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Delete this line and enter your initial answer in the code cell below.</font>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ENTER YOUR INITIAL ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Delete this line and enter your updated answer in the code cell below.</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ENTER YOUR UPDATED ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 4. [10 points] Provide code to compute the correlation matrix of the first five **numerical features** (and the label if it is numerical). Plot this correlation matrix as a heatmap (a grid, with colors representing the correlation coefficients).\n",
    "\n",
    "For this problem, we recommend referencing the course notebook \"2.1 Pandas and Datasets.ipynb\", which provides an example of how the correlation matrix can be displayed using seaborn. You may reference this code when constructing your initial answer.\n",
    "\n",
    "**Note**: You may hard-code in the selection of the columns to use in order to obtain the first give numerical features. If your data set has fewer than 5 numerical features, include all of the numerical features.\n",
    "\n",
    "***Initial Answer***\n",
    "\n",
    "<font color=\"blue\">Delete this line and enter your initial answer in the code cell below.</font>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ENTER YOUR INITIAL ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***Updated Answer***\n",
    "\n",
    "<font color=\"Green\">Replace this text with your response to the solution document. Note: You may also include updated code in the code block below.</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ENTER YOUR UPDATED ANSWER HERE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Part 1 Question 6 Hint**\n",
    "<!-- Consider ways to further emphasize or de-emphasize larger or smaller errors, or to perhaps increase the emphasis on over-predictions relative to under-predictions. -->"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The solutions can be found here: [https://people.cs.umass.edu/~pthomas/courses/COMPSCI_389_Spring2024/Homework%201%20Solutions.ipynb](https://people.cs.umass.edu/~pthomas/courses/COMPSCI_389_Spring2024/Homework%201%20Solutions.ipynb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}